In this section I import the data and prepare it for analysis.
Code
```{r setup}#| warning: false#| message: falselibrary(tidyverse)library(plotly)library(dplyr)library(tidyr)library(knitr)library(table1) #Create HTML Tables of Descriptive Statistics https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html#library(OMTM1) #https://github.com/schildjs/OMTM1/library(Hmisc)library(rms) # Regression Modeling Strategies by Frank https://cran.r-project.org/web/packages/rms/index.htmllibrary(modelsummary) #Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready https://cran.r-project.org/web/packages/modelsummary/index.htmllibrary(scales) # The scales packages provides the internal scaling infrastructure used by ggplot2, and gives you tools to override the default breaks, labels, transformations and palettes. https://scales.r-lib.orglibrary(viridis) #colorslibrary(cowplot) #allows me to use plotgridlibrary(gridExtra) #adding tables to plotslibrary(visdat) #shows missing datalibrary(GGally) #makes pairs plotslibrary(sandwich) #for robust standard errorssetwd("/Users/lisalevoir/BIOS7351_Collab/data_project2") #this line I would need to run in the consoleknitr::opts_knit$set(root.dir ="/Users/lisalevoir/BIOS7351_Collab/github/BIOS_Collaboration/project_2_analysis") #now I set global options for knitting, I also had to toggle global options > R Markdown > evaluate chunks in current directory#import the datadat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/combined_data_203.csv")#to compare that the merging went as expectedVUMSdat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/DATA_VUMS.csv")HMSdat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/DATA_HMS.csv")UVAdat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/DATA_UVA.csv")```
1.0.1 Inclusion/Exclusion
Criteria to exclude students who most likely took a scored exam:
Any PhD students (n = 2)
Any 5th year program students (n = 19)
M4 students at Vanderbilt (n = 5)
Students who did not complete either Step survey (n = 2)
Students who specifically stated they took a scored Step 1 (n=1)
Based on our criteria we would exclude record IDs:
Data quality: there are r length(step1_complete$uniqueID) unique individuals included in the step 1 data. Below is a table of their inclusion by school:
Code
kable(table(step1_complete$schoolid))
Var1
Freq
HMS
170
UVa
196
VU
171
[1] "uworld_percent_step2_1" "uworld_percent_step2_2"
[1] "amboss_percent_step2_1" "amboss_percent_step2_2"
[1] "length_step2_1" "length_step2_2"
[1] "practicetest_step2_1" "practicetest_step2_2"
[1] "full_test_practice_step2_1" "full_test_practice_step2_2"
[1] "practice_score_step2_1" "practice_score_step2_2"
[1] "practice_test_step2_1" "practice_test_step2_2"
[1] "resources_step2_1___1" "resources_step2_1___2" "resources_step2_1___3"
[4] "resources_step2_1___4" "resources_step2_1___5" "resources_step2_1___6"
[7] "resources_step2_1___7" "resources_step2_1___8" "resources_step2_2___1"
[10] "resources_step2_2___2" "resources_step2_2___3" "resources_step2_2___4"
[13] "resources_step2_2___5" "resources_step2_2___6" "resources_step2_2___7"
[16] "resources_step2_2___8"
[1] "score_step2_1" "score_step2_2"
[1] "target_score_step2_1" "target_score_step2_2"
[1] "Above is a list of columns I have combined for you. Hope it looks right!"
Notice that unfortunately, we had to drop 138 individuals who did not report a step 2 score. These raw counts and frequencies of people who did not give a step 2 score (and are therefor not eligible for analysis) are listed by institution below.
Code
# describing the missingnesskable(table(drop_these_with_no_outcome$schoolid), caption ="Number of missing Step 2 scores by institution")
Number of missing Step 2 scores by institution
Var1
Freq
HMS
41
UVa
52
VU
45
Code
num <-as.vector(table(drop_these_with_no_outcome$schoolid))denom <-as.vector(table(step2_complete$schoolid))nonresponse_freq <-setNames(c(round(num/denom, 3)), c(names(table(step2_complete$schoolid))))kable(nonresponse_freq, caption ="frequency of missing step 2 scores by instition")
Note, we plan to use the robust/sandwich variance estimator for regression models. One inclusion criteria is that the outcome variable (“Y”) must be available for a subject to be included in the analysis question (ie. if they did not report a step 2 score, we won’t perform relevant step 2 analysis on them).
summary(exam_order_mod) #model summary for reporting
Call:
lm(formula = score_step2 ~ order_factor + school_factor, data = step2_complete)
Residuals:
Min 1Q Median 3Q Max
-112.816 -5.523 3.184 8.377 21.759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 261.1405 1.4589 178.992 <2e-16 ***
order_factorstep 2 first -1.8994 1.4832 -1.281 0.201
order_factoronly step 2 -6.9279 7.4321 -0.932 0.352
school_factorUVA 0.3822 1.7843 0.214 0.831
school_factorVUMS 0.5748 1.8396 0.312 0.755
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.69 on 394 degrees of freedom
Multiple R-squared: 0.00609, Adjusted R-squared: -0.004
F-statistic: 0.6035 on 4 and 394 DF, p-value: 0.6603
Based on the model output (p values), there doesn’t appear to be any signifigant associations between exam order and the score for Step 2. Since the R^2 value is essentially 0, I conclude that there was no effect of exam order on step 2 scores.
Again, we cannot analyze Step 1 scores since all respondents reported passing.
Based on our SAP, if there are any covariates with more than 30% of responses missing, we will drop that variable or populate it with 0, depending on context. For example, the percent of Amboss questions completed will be filled with 0 for people who didn’t answer since it seems safe to assume they didn’t complete any of the Amboss questions. If less than 30% are missing, I may consider performing bootstrap sampling of known values to replace missing values.
After accounting for missingness, I will assess for co-linearity of the predictors (ie. correlation) using VIF. If there is high co-linearity, we will use LASSO to perform variable selection. If there is no evidence of concerning levels of colinearity, I will proceed with linear regression.
Code
######## profile missingness in the step 2 data and addresspercents_missing <-round(colSums(is.na(step2_complete))/nrow(step2_complete), 3)*100kable(percents_missing, caption ="Percent missing observations for pooled Step 2 survey")
Percent missing observations for pooled Step 2 survey
x
record_id
0.0
schoolid
0.0
exam_order
0.0
uworld_percent_step2
1.5
amboss_percent_step2
75.2
length_step2
1.8
practicetest_step2
0.0
full_test_practice_step2
0.0
practice_score_step2
27.8
practice_test_step2
27.1
resources_step2
0.0
score_step2
0.0
target_score_step2
0.0
order_factor
0.0
school_factor
0.0
Code
#inspecting percent missing, it seems like most responses are now complete except Amboss. Based on our study plan, I will populate those without a response for Amboss with 0'sstep2_complete$amboss_percent_step2 <-ifelse(is.na(step2_complete$amboss_percent_step2) ==TRUE, 0, step2_complete$amboss_percent_step2)class(step2_complete$practicetest_step2) <-"integer"step2_complete[,"on_target"] <-factor(step2_complete$target_score_step2, levels =c(1,2,3), labels =c("at target", "above target", "below target"))step2_complete[, "practice_test_2_clean"] <-ifelse(is.na(step2_complete$practicetest_step2) ==TRUE, NA, substr(step2_complete$practicetest_step2, start =1, stop =1))step2_complete[, "number_of_practice_tests"] <-as.numeric(step2_complete$practice_test_2_clean)
Multiple linear regression with:
Y = Step 2 score
X1 = % UWorld
X2 = % Amboss
X3 = length study
X4 = # of practice tests
X5 = full test day (yes/no code as binary)
X6 = final practice score (however, is there some conversion between U World and Amboss - Jeffrey is looking into this)
Z = School (need to adjust for this)
Careful to note that not all cases are complete - for example there are 399 responses in the complete step 2 dataset, of which for the number of practice tests taken, 108 are missing and 291 have a response recorded.
Below I report the model results, sandqich variance, and VIF for step 2 scores model.
mod_step2_scores2 <-lm(score_step2 ~ uworld_percent_step2 + amboss_percent_step2 + length_step2 + simulate_full_practice + practice_score_step2 + practice_test_2_clean + school_factor, data = step2_complete)summary(mod_step2_scores2) #we need to talk about these model results
I looked into the difference between practice_test_step2 (the final practice test I took before my exam was… 8 options) and practicetest_step2 (text response of how many practice tests did you take before step 2). I decided not to include either in the model.
Note
We need to talk about the model results between the two ways I coded the variables for # of practice tests
Here, I will perform logistic regression with
Y = yes or no (1 = yes, 2 = no for “push_step1”)
There may not be sufficient data on this since only 20 people responded that they decided to push back Step 1. The factors that were measured are:
push remember step1 (1 = I only remember the form name, 2 = I only remember the score, 3 = I remember the form name and the score, 4 = I don’t remember either) - we decided not to include this variable (can change later if desired)
push score only step 1 (1 = NBME, 2 = Uworld)
push practice test step 1 ( 1 - 8 listing various exams)
push nbme practice score (from 0 to 100%)
push uw practice score (from 180 to 300)
Listing variables by name and if I have included them:
“push_step1_1” yes
“push_remember_step1_1” not included b/c a precursor question
“push_score_only_step1_1” not currently included but could be
“push_practice_test_step1_1” yes
“push_nbme_practice_score_step1_1” yes
“push_uw_practice_score_step1_1” yes
Code
step1_complete$push_step1 <-ifelse(step1_complete$push_step1 ==2|is.na(step1_complete$push_step1) ==TRUE, 2, 1) #recording the NA's to be "No" (they did not push back step 1)step1_complete$push_step1_label <-factor(step1_complete$push_step1, levels =c(1, 2), labels =c("Yes", "No")) #making a nice descriptive labeldid_push_df <- step1_complete %>%filter(push_step1_label =="Yes")dat %>%filter(!is.na(push_remember_step1_1))
X record_id school timestamp year exam_order step_spacing
1 9 7 VUSM 2/8/23 9:31 3 1 2
2 12 12 VUSM 2/8/23 10:05 5 1 1
3 23 39 VUSM 2/16/23 20:12 6 1 3
4 29 51 VUSM 2/20/23 12:57 3 1 3
5 47 9 VUSM 2/8/23 9:53 3 3 3
6 50 20 VUSM 2/8/23 11:05 3 1 3
7 60 57 VUSM 2/21/23 10:19 3 3 3
8 75 12 UVASOM 2023-08-17 11:31:21 7 1 2
9 129 67 UVASOM 2023-09-07 09:48:22 7 1 2
10 141 79 UVASOM 2023-09-07 16:02:17 7 1 2
11 174 30 HMS 2023-08-23 18:50:56 4 3 2
12 196 55 HMS 2023-10-04 08:31:07 7 1 2
13 199 58 HMS 2023-10-04 15:27:27 7 1 2
step_spacing_months survey_complete step_1_first_timestamp
1 NA 2 2/8/23 9:33
2 NA 2 2/8/23 10:11
3 5 2 2/16/23 20:14
4 3 2 2/20/23 12:59
5 7 2 2/8/23 9:56
6 5 2 2/8/23 11:19
7 6 2 2/21/23 10:26
8 NA 2 2023-08-17 11:32:55
9 NA 2 2023-09-07 09:49:48
10 NA 2 2023-09-07 16:04:58
11 NA 2 2023-08-23 19:10:48
12 NA 2 2023-10-04 08:37:24
13 NA 2 2023-10-04 15:29:01
resources_step1_1___1 resources_step1_1___2 resources_step1_1___3
1 1 1 0
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 0
6 1 1 1
7 0 0 1
8 1 1 1
9 1 1 0
10 1 1 0
11 1 1 0
12 1 1 1
13 1 1 0
resources_step1_1___4 resources_step1_1___5 resources_step1_1___6
1 0 0 0
2 1 0 0
3 1 0 1
4 1 0 1
5 1 0 0
6 1 0 0
7 1 0 0
8 0 1 0
9 0 0 0
10 0 0 0
11 1 0 0
12 1 1 1
13 0 0 0
resources_step1_1___7 resources_step1_1___8 other_resources_step1_1
1 0 0 <NA>
2 0 0 <NA>
3 1 0 <NA>
4 0 0 <NA>
5 0 0 <NA>
6 0 0 <NA>
7 0 0 <NA>
8 0 0 <NA>
9 0 0 <NA>
10 0 0 <NA>
11 0 1 Osmosis
12 1 0
13 0 0
other_step1_1 uworld_percent_step1_1 uworld_step1_1 first_aid_step1_1
1 <NA> NA 3 1
2 <NA> 60 3 1
3 <NA> 80 1 1
4 <NA> 65 1 1
5 <NA> 50 1 1
6 <NA> 69 1 1
7 <NA> NA NA NA
8 <NA> 75 1 1
9 <NA> 70 3 2
10 <NA> 35 3 1
11 Yes 30 3 2
12 90 3 1
13 86 1 1
anki_step1_1 anki_use_step1_1 anki_details_step1_1___1
1 NA NA 0
2 1 1 0
3 1 3 0
4 1 3 1
5 NA NA 0
6 1 1 1
7 1 3 1
8 1 1 0
9 NA NA 0
10 NA NA 0
11 NA NA 0
12 1 3 0
13 NA NA 0
anki_details_step1_1___2 anki_details_step1_1___3 anki_details_step1_1___4
1 0 0 0
2 0 1 0
3 1 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 1 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 1 0
13 0 0 0
anki_details_step1_1___5 anki_details_step1_1___6 anki_details_step1_1___7
1 0 0 0
2 0 1 0
3 0 1 0
4 0 1 0
5 0 0 0
6 0 0 1
7 1 1 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 1 0
13 0 0 0
sketchy_step1_1 sketchy_details_step1_1___1 sketchy_details_step1_1___2
1 NA 0 0
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 0 1
7 1 1 1
8 NA 0 0
9 NA 0 0
10 NA 0 0
11 1 0 0
12 1 1 1
13 NA 0 0
sketchy_details_step1_1___3 sketchy_details_step1_1___4
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step1_1___5 sketchy_details_step1_1___6
1 0 0
2 0 1
3 0 0
4 0 0
5 0 0
6 0 0
7 0 1
8 0 0
9 0 0
10 0 0
11 1 0
12 0 1
13 0 0
sketchy_details_step1_1___7 amboss_details_step1_1___1
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
amboss_details_step1_1___2 amboss_details_step1_1___3 amboss_library_step1_1
1 0 0 NA
2 0 0 NA
3 0 0 NA
4 0 0 NA
5 0 0 NA
6 0 0 NA
7 0 0 NA
8 1 0 1
9 0 0 NA
10 0 0 NA
11 0 0 NA
12 0 1 1
13 0 0 NA
amboss_percent_step1_1 amboss_amount_step1_1 pathoma_step1_1 bnb_step1_1
1 NA NA NA NA
2 NA NA NA NA
3 NA NA 1 1
4 NA NA 1 NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 NA NA NA NA
10 NA NA NA NA
11 NA NA NA NA
12 32 3 1 1
13 NA NA NA NA
bnb_how_step1_1___1 bnb_how_step1_1___2 bnb_how_step1_1___3
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 1 0 0
13 0 0 0
bnb_how_step1_1___4 length_step1_1 practicetest_step1_1
1 0 4 4
2 0 3 3
3 0 6 6
4 0 4 4
5 0 3 5
6 0 6 6
7 0 6 5
8 0 4 4
9 0 4 3
10 0 4 3
11 0 3 4
12 0 5 3
13 0 5 5
practice_test_amount_step1_1 full_test_practice_step1_1 push_step1_1
1 3 2 1
2 3 1 1
3 2 4 1
4 1 3 1
5 3 1 1
6 3 3 1
7 3 4 1
8 3 3 1
9 1 3 1
10 3 1 1
11 3 4 1
12 3 3 1
13 1 1 1
push_remember_step1_1 push_score_only_step1_1 push_practice_test_step1_1
1 3 NA 4
2 4 NA NA
3 4 NA NA
4 2 1 NA
5 4 NA NA
6 4 NA NA
7 3 NA 3
8 4 NA NA
9 2 1 NA
10 4 NA NA
11 2 1 NA
12 2 1 NA
13 4 NA NA
push_nbme_practice_score_step1_1 push_uw_practice_score_step1_1
1 65 NA
2 NA NA
3 NA NA
4 70 NA
5 NA NA
6 NA NA
7 80 NA
8 NA NA
9 87 NA
10 NA NA
11 83 NA
12 30 NA
13 NA NA
final_practice_step1_1 final_practice_test_step1_1
1 1 6
2 0 NA
3 0 NA
4 1 7
5 0 NA
6 0 NA
7 1 2
8 1 7
9 1 7
10 1 6
11 1 5
12 0 NA
13 0 NA
final_nbme_practice_score_step1_1 final_uw_practice_score_step1_1
1 68 NA
2 NA NA
3 NA NA
4 95 NA
5 NA NA
6 NA NA
7 97 NA
8 97 NA
9 98 NA
10 99 NA
11 92 NA
12 NA NA
13 NA NA
score_step1_1 retake_step1_1 retake_pass_step1_1 study_amount_step1_1
1 1 NA NA 3
2 1 NA NA 1
3 1 NA NA 1
4 1 NA NA 1
5 1 NA NA 2
6 1 NA NA 2
7 1 NA NA 2
8 1 NA NA 1
9 1 NA NA 1
10 1 NA NA 3
11 1 NA NA 1
12 1 NA NA 1
13 1 NA NA 2
changes_step1_1
1
2 Didn't know how to note this above, but I did step 1 anki for the entirety of my last clerkship then spent 2 weeks of dedicated studying
3
4
5 I would have started reviewing material (i.e. watching B&B, Sketchy pharm) before dedicated started so that I could devote my dedicated time almost exclusively to UWorld questions
6 I ended up taking 12 weeks. M1 was terrible for me and I didn't know how to study. So, I would go back and study more for it during M1 or M2 but even 12 weeks was pushing it for me to make up the deficit I was in
7 I wish I would have started the premade anki deck during M1 instead of at the start of M2. I finished about 50% of STEP 1 cards and 75% of STEP 2 cards by the time I took Step 1 and I think it would have been much quicker to a passing score if i had a higher percentage of STEP 1 cards mastered. I did not use UWorld at all because I found that reviewing missed questions didn't help me at all. I never remembered the info or explanations unless I memorized it with anki first. Once I had things memorized, I had no issues applying the knowledge in exam form. The only questions I missed were ones that I had never seen on anki.
8 I wish it was after pre-clinical year and not after clinical year.
9 I wish we had completed step 1 post pre clerkship with dedicated time.
10 I would focus on pathophysiology section of uworld
11 I wish I didn't listen to the advice/guidance that my school (HMS) gave me. They really emphasized use of First Aid and at first I followed their guidance (because they shared so little guidance/support overall) even though I have never been a textbook reader, and so I wasted a lot of time studying via methods that were not compatible with my learning style. I also wish I had the opportunity to study with peers or through an HMS class because I also learn much better by talking through concepts with others, but there were no classes/study groups/tutoring or really any support offered by HMS. Given how much we pay for tuition to HMS, I think we should be able to opt-in or sign up for a Step 1 prep course - at a minimum this would provide some structure to an otherwise isolating and irritating experience.
12
13 Spent more time studying during pre-clinical years
step_1_first_complete step_2_first_timestamp resources_step2_1___1
1 2 0
2 2 0
3 2 0
4 2 0
5 2 0
6 2 0
7 2 0
8 2 0
9 2 0
10 2 0
11 2 0
12 2 0
13 2 0
resources_step2_1___2 resources_step2_1___3 resources_step2_1___4
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
resources_step2_1___5 resources_step2_1___6 resources_step2_1___7
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
resources_step2_1___8 other_resources_step2_1 other_step2_1
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0 <NA> <NA>
12 0 <NA> <NA>
13 0 <NA> <NA>
uworld_percent_step2_1 uworld_step2_1 first_aid_step2_1 anki_step2_1
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 NA NA NA NA
10 NA NA NA NA
11 NA NA NA NA
12 NA NA NA NA
13 NA NA NA NA
anki_use_step2_1 anki_details_step2_1___1 anki_details_step2_1___2
1 NA 0 0
2 NA 0 0
3 NA 0 0
4 NA 0 0
5 NA 0 0
6 NA 0 0
7 NA 0 0
8 NA 0 0
9 NA 0 0
10 NA 0 0
11 NA 0 0
12 NA 0 0
13 NA 0 0
anki_details_step2_1___3 anki_details_step2_1___4 anki_details_step2_1___5
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
anki_details_step2_1___6 anki_details_step2_1___7 sketchy_step2_1
1 0 0 NA
2 0 0 NA
3 0 0 NA
4 0 0 NA
5 0 0 NA
6 0 0 NA
7 0 0 NA
8 0 0 NA
9 0 0 NA
10 0 0 NA
11 0 0 NA
12 0 0 NA
13 0 0 NA
sketchy_details_step2_1___1 sketchy_details_step2_1___2
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step2_1___3 sketchy_details_step2_1___4
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step2_1___5 sketchy_details_step2_1___6
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step2_1___7 amboss_details_step2_1___1
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
amboss_details_step2_1___2 amboss_details_step2_1___3 amboss_library_step2_1
1 0 0 NA
2 0 0 NA
3 0 0 NA
4 0 0 NA
5 0 0 NA
6 0 0 NA
7 0 0 NA
8 0 0 NA
9 0 0 NA
10 0 0 NA
11 0 0 NA
12 0 0 NA
13 0 0 NA
amboss_percent_step2_1 amboss_amount_step2_1 pathoma_step2_1 bnb_step2_1
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 NA NA NA NA
10 NA NA NA NA
11 NA NA NA NA
12 NA NA NA NA
13 NA NA NA NA
bnb_how_step2_1___1 bnb_how_step2_1___2 bnb_how_step2_1___3
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
bnb_how_step2_1___4 length_step2_1 practicetest_step2_1
1 0 NA <NA>
2 0 NA <NA>
3 0 NA <NA>
4 0 NA <NA>
5 0 NA <NA>
6 0 NA <NA>
7 0 NA <NA>
8 0 NA
9 0 NA
10 0 NA
11 0 NA <NA>
12 0 NA <NA>
13 0 NA <NA>
practice_test_amount_step2_1 full_test_practice_step2_1
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA
11 NA NA
12 NA NA
13 NA NA
final_practice_step2_1 practice_test_step2_1 practice_score_step2_1
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 NA NA NA
score_step2_1 retake_step2_1 retake_pass_step2_1 target_score_step2_1
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 <NA> NA NA NA
9 <NA> NA NA NA
10 <NA> NA NA NA
11 <NA> NA NA NA
12 <NA> NA NA NA
13 <NA> NA NA NA
study_amount_step2_1 changes_step2_1 step_2_first_complete
1 NA 0
2 NA 0
3 NA 0
4 NA 0
5 NA 0
6 NA 0
7 NA 0
8 NA 0
9 NA 0
10 NA 0
11 NA 0
12 NA 0
13 NA 0
step_1_second_timestamp resources_step1_2___1 resources_step1_2___2
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
resources_step1_2___3 resources_step1_2___4 resources_step1_2___5
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
resources_step1_2___6 resources_step1_2___7 resources_step1_2___8
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
other_resources_step1_2 other_step1_2 uworld_percent_step1_2
1 <NA> <NA> NA
2 <NA> <NA> NA
3 <NA> <NA> NA
4 <NA> <NA> NA
5 <NA> <NA> NA
6 <NA> <NA> NA
7 <NA> <NA> NA
8 NA
9 NA
10 NA
11 <NA> <NA> NA
12 <NA> <NA> NA
13 <NA> <NA> NA
first_aid_step1_2 uworld_step1_2 anki_step1_2 anki_use_step1_2
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 NA NA NA NA
10 NA NA NA NA
11 NA NA NA NA
12 NA NA NA NA
13 NA NA NA NA
anki_details_step1_2___1 anki_details_step1_2___2 anki_details_step1_2___3
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
anki_details_step1_2___4 anki_details_step1_2___5 anki_details_step1_2___6
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
anki_details_step1_2___7 anki_details_step1_2___8 sketchy_step1_2
1 0 0 NA
2 0 0 NA
3 0 0 NA
4 0 0 NA
5 0 0 NA
6 0 0 NA
7 0 0 NA
8 0 0 NA
9 0 0 NA
10 0 0 NA
11 0 0 NA
12 0 0 NA
13 0 0 NA
sketchy_details_step1_2___1 sketchy_details_step1_2___2
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step1_2___3 sketchy_details_step1_2___4
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step1_2___5 sketchy_details_step1_2___6
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step1_2___7 sketchy_details_step1_2___8
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
amboss_details_step1_2___1 amboss_details_step1_2___2
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
amboss_details_step1_2___3 amboss_library_step1_2 amboss_percent_step1_2
1 0 NA NA
2 0 NA NA
3 0 NA NA
4 0 NA NA
5 0 NA NA
6 0 NA NA
7 0 NA NA
8 0 NA NA
9 0 NA NA
10 0 NA NA
11 0 NA NA
12 0 NA NA
13 0 NA NA
amboss_amount_step1_2 pathoma_step1_2 bnb_step1_2 bnb_how_step1_2___1
1 NA NA NA 0
2 NA NA NA 0
3 NA NA NA 0
4 NA NA NA 0
5 NA NA NA 0
6 NA NA NA 0
7 NA NA NA 0
8 NA NA NA 0
9 NA NA NA 0
10 NA NA NA 0
11 NA NA NA 0
12 NA NA NA 0
13 NA NA NA 0
bnb_how_step1_2___2 bnb_how_step1_2___3 bnb_how_step1_2___4 length_step1_2
1 0 0 0 NA
2 0 0 0 NA
3 0 0 0 NA
4 0 0 0 NA
5 0 0 0 NA
6 0 0 0 NA
7 0 0 0 NA
8 0 0 0 NA
9 0 0 0 NA
10 0 0 0 NA
11 0 0 0 NA
12 0 0 0 NA
13 0 0 0 NA
practicetest_step1_2 practice_test_amount_step1_2 full_test_practice_step1_2
1 <NA> NA NA
2 <NA> NA NA
3 <NA> NA NA
4 <NA> NA NA
5 <NA> NA NA
6 <NA> NA NA
7 <NA> NA NA
8 NA NA
9 NA NA
10 NA NA
11 <NA> NA NA
12 <NA> NA NA
13 <NA> NA NA
push_step1_2 push_remember_step1_2 push_score_only_step1_2
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 NA NA NA
push_practice_test_step1_2 push_nbme_practice_score_step1_2
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA
11 NA NA
12 NA NA
13 NA NA
push_uw_practice_score_step1_2 final_practice_step1_2
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA
11 NA NA
12 NA NA
13 NA NA
final_practice_test_step1_2 final_nbme_practice_score_step1_2
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA
11 NA NA
12 NA NA
13 NA NA
final_uw_practice_score_step1_2 score_step1_2 retake_step1_2
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 NA NA NA
retake_pass_step1_2 study_amount_step1_2 changes_step1_2
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA
11 NA NA <NA>
12 NA NA <NA>
13 NA NA <NA>
step_1_second_complete step_2_second_timestamp resources_step2_2___1
1 0 2/8/23 9:35 1
2 0 2/8/23 10:12 1
3 0 2/16/23 20:15 1
4 0 2/20/23 13:01 1
5 0 0
6 0 0
7 0 0
8 0 0
9 0 2023-09-07 09:50:43 1
10 0 2023-09-07 16:06:06 1
11 0 0
12 0 2023-10-04 08:42:48 1
13 0 2023-10-04 15:31:29 1
resources_step2_2___2 resources_step2_2___3 resources_step2_2___4
1 1 0 0
2 1 0 1
3 0 1 0
4 0 1 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 1 1
13 0 0 0
resources_step2_2___5 resources_step2_2___6 resources_step2_2___7
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 1 1 1
13 0 0 0
resources_step2_2___8
1 0
2 0
3 0
4 1
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 1
other_resources_step2_2 other_step2_2
1
2
3
4 Divine Intervention Podcasts Yes
5
6
7
8
9
10
11
12
13 Summary PDFs/content review I found online, youtube videos Yes
uworld_percent_step2_2 uworld_step2_2 first_aid_step2_2 anki_step2_2
1 70 1 2 NA
2 66 3 1 NA
3 65 1 NA 1
4 80 1 NA 1
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 50 1 NA NA
10 61 1 NA NA
11 NA NA NA NA
12 94 3 NA 1
13 71 1 NA NA
anki_use_step2_2 anki_details_step2_2___1 anki_details_step2_2___2
1 NA 0 0
2 NA 0 0
3 3 0 0
4 3 1 0
5 NA 0 0
6 NA 0 0
7 NA 0 0
8 NA 0 0
9 NA 0 0
10 NA 0 0
11 NA 0 0
12 3 0 0
13 NA 0 0
anki_details_step2_2___3 anki_details_step2_2___4 anki_details_step2_2___5
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 1 0 0
13 0 0 0
anki_details_step2_2___6 anki_details_step2_2___7 anki_details_step2_2___8
1 0 0 0
2 0 0 0
3 0 1 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 1 0
13 0 0 0
sketchy_step2_2 sketchy_details_step2_2___1 sketchy_details_step2_2___2
1 NA 0 0
2 1 1 0
3 NA 0 0
4 NA 0 0
5 NA 0 0
6 NA 0 0
7 NA 0 0
8 NA 0 0
9 NA 0 0
10 NA 0 0
11 NA 0 0
12 1 0 1
13 NA 0 0
sketchy_details_step2_2___3 sketchy_details_step2_2___4
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 1 0
13 0 0
sketchy_details_step2_2___5 sketchy_details_step2_2___6
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
sketchy_details_step2_2___7 sketchy_details_step2_2___8
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 1 0
13 0 0
amboss_details_step2_2___1 amboss_details_step2_2___2
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 1 1
13 0 0
amboss_details_step2_2___3 amboss_library_step2_2 amboss_percent_step2_2
1 0 NA NA
2 0 NA NA
3 0 NA NA
4 0 NA NA
5 0 NA NA
6 0 NA NA
7 0 NA NA
8 0 NA NA
9 0 NA NA
10 0 NA NA
11 0 NA NA
12 1 1 41
13 0 NA NA
amboss_amount_step2_2 pathoma_step2_2 bnb_step2_2 bnb_how_step2_2___1
1 NA NA NA 0
2 NA NA NA 0
3 NA NA NA 0
4 NA NA NA 0
5 NA NA NA 0
6 NA NA NA 0
7 NA NA NA 0
8 NA NA NA 0
9 NA NA NA 0
10 NA NA NA 0
11 NA NA NA 0
12 3 1 1 1
13 NA NA NA 0
bnb_how_step2_2___2 bnb_how_step2_2___3 bnb_how_step2_2___4 length_step2_2
1 0 0 0 4
2 0 0 0 3
3 0 0 0 5
4 0 0 0 5
5 0 0 0 NA
6 0 0 0 NA
7 0 0 0 NA
8 0 0 0 NA
9 0 0 0 4
10 0 0 0 4
11 0 0 0 NA
12 0 0 0 4
13 0 0 0 4
practicetest_step2_2 practice_test_amount_step2_2 full_test_practice_step2_2
1 4 1 3
2 3 3 4
3 5 2 4
4 4 1 3
5 NA NA
6 NA NA
7 NA NA
8 <NA> NA NA
9 3 1 3
10 3 1 1
11 <NA> NA NA
12 3 3 3
13 5 1 1
final_practice_step2_2 practice_test_step2_2 practice_score_step2_2
1 0 NA
2 1 7 251
3 1 3 243
4 1 7 242
5 NA NA
6 NA NA
7 NA NA
8 NA NA <NA>
9 0 NA <NA>
10 0 NA <NA>
11 NA NA <NA>
12 0 NA <NA>
13 1 5 235
score_step2_2 retake_step2_2 retake_pass_step2_2 target_score_step2_2
1 239 NA NA 3
2 250 NA NA 1
3 no score yet NA NA 1
4 244 NA NA 1
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 n/a NA NA 1
10 253 NA NA 1
11 <NA> NA NA NA
12 266 NA NA 2
13 230 NA NA 3
study_amount_step2_2
1 2
2 1
3 2
4 1
5 NA
6 NA
7 NA
8 NA
9 1
10 2
11 NA
12 1
13 2
changes_step2_2
1
2
3
4
5
6
7
8
9 Dedicated time exclusive to step 2 post clerkship with step 1 completed pre-clerkship.
10
11
12 Simulate a full day exam before taking the test. I had done half length practice tests leading up to it but found that I got fatigued on the test day for the last three sections.
13 Studied more for Step 1. Would have done Anki in the months leading up to the exam.
step_2_second_complete uniqueID
1 2 VUSM 7
2 2 VUSM 12
3 2 VUSM 39
4 2 VUSM 51
5 0 VUSM 9
6 0 VUSM 20
7 0 VUSM 57
8 0 UVASOM 12
9 2 UVASOM 67
10 2 UVASOM 79
11 0 HMS 30
12 2 HMS 55
13 2 HMS 58
Code
step1_complete$push_practice_test_step1 <-factor(step1_complete$push_practice_test_step1, levels =c(1:8), labels =c("NBME 25", "NBME 26", "NBME 27", "NBME 28", "NBME 29", "NBME 30", "UWorld 1", "UWorld 2"))units(step1_complete$push_uw_practice_score_step1) <-"score units"units(step1_complete$push_nbme_practice_score_step1) <-"percent"label(step1_complete$push_practice_test_step1) <-"exam that triggered pushing back"label(step1_complete$push_nbme_practice_score_step1) <-"NBME P(passing) that triggered pushing back"label(step1_complete$push_uw_practice_score_step1) <-"3 digit UWorld score that triggered pushing back"caption <-"Description of indiviuals that pushed back Step 1"table1(~ push_practice_test_step1 + push_uw_practice_score_step1 + push_nbme_practice_score_step1 |push_step1_label, data=step1_complete, topclass="Rtable1-zebra")
Yes (N=62)
No (N=475)
Overall (N=537)
exam that triggered pushing back
NBME 25
0 (0%)
0 (0%)
0 (0%)
NBME 26
0 (0%)
0 (0%)
0 (0%)
NBME 27
3 (4.8%)
0 (0%)
3 (0.6%)
NBME 28
3 (4.8%)
0 (0%)
3 (0.6%)
NBME 29
0 (0%)
0 (0%)
0 (0%)
NBME 30
0 (0%)
0 (0%)
0 (0%)
UWorld 1
0 (0%)
0 (0%)
0 (0%)
UWorld 2
3 (4.8%)
0 (0%)
3 (0.6%)
Missing
53 (85.5%)
475 (100%)
528 (98.3%)
3 digit UWorld score that triggered pushing back (score units)
Mean (SD)
180 (0)
NA (NA)
180 (0)
Median [Min, Max]
180 [180, 180]
NA [NA, NA]
180 [180, 180]
Missing
59 (95.2%)
475 (100%)
534 (99.4%)
NBME P(passing) that triggered pushing back (percent)
Mean (SD)
67.3 (17.7)
NA (NA)
67.3 (17.7)
Median [Min, Max]
70.0 [30.0, 87.0]
NA [NA, NA]
70.0 [30.0, 87.0]
Missing
37 (59.7%)
475 (100%)
512 (95.3%)
Descriptive statistics will be reported by school and total.
histogram of Step 2 scores
what resources are most widely used barplot
how long did students study barplot
number of practice tests histogram (among the people who answered the question)
## do a plot for all schools (in total) as well as Vanderbilt in particular## use a unified color schemeggplot(aes(x = score_step2), data= step2_complete) +geom_histogram() +theme_minimal() +xlab("Self Reported Step 2 Score") +ylab("Frequency") +labs(title ="Frequency of reported Step 2 Scores")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Code
#more data cleaning first first step 2 for the resouces they selected that they usedresources_step2 <-colSums(took_step2general[20:35])uworld <- resources_step2[1] + resources_step2[9]first_aid <- resources_step2[2] + resources_step2[10]anki <- resources_step2[3] + resources_step2[11]sketchy <- resources_step2[4] +resources_step2[12]amboss <- resources_step2[5] + resources_step2[13]pathoma <- resources_step2[6] + resources_step2[14]boards_and_beyond <- resources_step2[7] + resources_step2[15]other <- resources_step2[8] + resources_step2[16]totals <-c(uworld, first_aid, anki, sketchy, amboss, pathoma, boards_and_beyond, other)#same idea for step 1resources_step1 <-colSums(took_step1general[26:41])uworld1 <- resources_step1[1] + resources_step1[9]first_aid1 <- resources_step1[2] + resources_step1[10]anki1 <- resources_step1[3] + resources_step1[11]sketchy1 <- resources_step1[4] +resources_step1[12]amboss1 <- resources_step1[5] + resources_step1[13]pathoma1 <- resources_step1[6] + resources_step1[14]boards_and_beyond1 <- resources_step1[7] + resources_step1[15]other1 <- resources_step1[8] + resources_step1[16]totals1 <-c(uworld1, first_aid1, anki1, sketchy1, amboss1, pathoma1, boards_and_beyond1, other1)rdf <-data.frame(amount =c(totals, totals1), name =rep(names(totals), 2), step_exam =c(rep("Step 2", 8), rep("Step 1", 8) ))# http://www.sthda.com/english/wiki/ggplot2-barplots-quick-start-guide-r-software-and-data-visualization used this as a guide#what resources are most widely used? I want to sort this barplot in order of frequency but for some reason this wasn't working for me with reorder()ggplot(data=rdf, aes(x=name, y=amount, fill = step_exam)) +geom_bar(stat="identity", position=position_dodge())+theme_minimal() +coord_flip() +scale_fill_brewer(palette="Blues")
Code
## how long did the students study barplot# for step 1step1_complete$length_step1 <-factor(step1_complete$length_step1 , levels =c(1:6), labels =c("less than 1 week", "1-2 weeks", "3-4 weeks", "5-6 weeks", "7-8 weeks", "more than 8 weeks"))ggplot(data.frame(step1_complete), aes(x=length_step1)) +geom_bar() +theme_minimal() +xlab("Time") +ylab("Frequency") +labs(title ="How long did you study for Step 1 during a protected study period?")
Code
# for step 2ggplot(data.frame(step2_complete), aes(x=length_step2)) +geom_bar() +theme_minimal() +xlab("Time") +ylab("Frequency") +labs(title ="How long did you study for Step 2 during a protected study period?")
Code
## number of practice tests histogram (among the people who answered the question)ggplot(aes(x = number_of_practice_tests), data= step2_complete) +geom_histogram() +theme_minimal() +xlab("Number of tests") +ylab("Frequency") +labs(title ="How many total practice tests did you take before Step 2?")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(aes(x = practicetest_step1), data= step1_complete) +geom_histogram() +theme_minimal() +xlab("Number of tests") +ylab("Frequency") +labs(title ="How many total practice tests did you take before Step 1?")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
NBME and Dr. Legallo's videos and notes from preclinical
2
nbme practice exams
2
NBME practice tests
3
Online MedEd
3
Osmosis
2
Pixorize
2
Youtube
3
Code
#If you used other resources not listed above, would you use them again?step1_complete$other_hascontents <-as.integer(nchar(step1_complete$other_resources_step1) >0) #flagging the rows that have contents https://stackoverflow.com/questions/64744988/testing-to-see-if-characters-are-present-in-a-cell-in-ras_tibble(step1_complete[which(step1_complete$other_hascontents ==1), c("other_resources_step1", "other_step1") ])
# A tibble: 24 × 2
other_resources_step1 other_step1
<chr> <chr>
1 nbme practice exams yes
2 NBME practice tests Yes
3 Youtube Yes
4 Online MedEd Maybe
5 Pixorize Yes, helped immensely with biochemistry and im…
6 Dirty Medicine YouTube videos yes
7 Dirty Medicine Yes
8 nbme practice exams yes
9 NBME practice tests Yes
10 Youtube Yes
# ℹ 14 more rows
Code
# what would you change?step1_complete$change_hascontents <-as.integer(nchar(step1_complete$changes_step1) >0) as_tibble(step1_complete[which(step1_complete$change_hascontents ==1), "changes_step1" ])
# A tibble: 114 × 1
value
<chr>
1 "Just do UWorld and know First Aid well, that's all you really need."
2 "Didn't know how to note this above, but I did step 1 anki for the entirety …
3 "Though I probably could have studied much less, I did not find the '% likel…
4 "I would probably take the test after 2 weeks instead of 3"
5 "I would have taken it immediately after ending clerkships and then gone off…
6 "I took a practice test on Day 1 of studying, which I did not find helpful. …
7 "None"
8 "If anything, maybe one less week. It can definitely be done in 4 weeks but …
9 "I would have a shorter study period and focused more on step 2"
10 "I would have switched to timed, untutored mode on U world sooner. On the re…
# ℹ 104 more rows
Code
#dat$study_amount_step1_1
3 Appendix/notes
All the analyses are performed using the following:
R version 4.2.2 (2022-06-24); R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
---title: "USMLE Data Analysis"author: "Lisa Levoir"date: "`r format(Sys.time(), '%B %d, %Y')`"format: html: theme: flatly code-fold: true code-tools: true html-math-method: katex toc: true toc-depth: 3 fig-width: 13 fig-height: 10 toc-title: "Contents" number-sections: true self-contained: true self-contained-math: true smooth-scroll: true fontsize: 0.8em title-block-banner: true citation-location: margin include-after-body: graph_fold.htmleditor: visualengine: knitr---# Data import and cleaningIn this section I import the data and prepare it for analysis.```{r setup}#| warning: false#| message: false#| echo: fencedlibrary(tidyverse)library(plotly)library(dplyr)library(tidyr)library(knitr)library(table1) #Create HTML Tables of Descriptive Statistics https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html#library(OMTM1) #https://github.com/schildjs/OMTM1/library(Hmisc)library(rms) # Regression Modeling Strategies by Frank https://cran.r-project.org/web/packages/rms/index.htmllibrary(modelsummary) #Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready https://cran.r-project.org/web/packages/modelsummary/index.htmllibrary(scales) # The scales packages provides the internal scaling infrastructure used by ggplot2, and gives you tools to override the default breaks, labels, transformations and palettes. https://scales.r-lib.orglibrary(viridis) #colorslibrary(cowplot) #allows me to use plotgridlibrary(gridExtra) #adding tables to plotslibrary(visdat) #shows missing datalibrary(GGally) #makes pairs plotslibrary(sandwich) #for robust standard errorssetwd("/Users/lisalevoir/BIOS7351_Collab/data_project2") #this line I would need to run in the consoleknitr::opts_knit$set(root.dir ="/Users/lisalevoir/BIOS7351_Collab/github/BIOS_Collaboration/project_2_analysis") #now I set global options for knitting, I also had to toggle global options > R Markdown > evaluate chunks in current directory#import the datadat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/combined_data_203.csv")#to compare that the merging went as expectedVUMSdat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/DATA_VUMS.csv")HMSdat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/DATA_HMS.csv")UVAdat <-read.csv("/Users/lisalevoir/BIOS7351_Collab/data_project2/DATA_UVA.csv")```### Inclusion/ExclusionCriteria to exclude students who most likely took a scored exam:- Any PhD students (n = 2)- Any 5th year program students (n = 19)- M4 students at Vanderbilt (n = 5)- Students who did not complete either Step survey (n = 2)- Students who specifically stated they took a scored Step 1 (n=1)Based on our criteria we would exclude record IDs:- VUSM: 23, 26, 39, 40, 54, 3, 8, 12, 49, 60, 62, 64- HMS: 1, 21, 28, 30, 34, 37, 39, 41, 44, 47, 49, 61- UVA: 33, 47, 80, 81, 83```{r data_cleaning}#| warning: false#| message: false#| echo: falsedat <- dat %>%mutate(uniqueID =paste(school, record_id))VUMSdat <- dat %>%mutate(uniqueID =paste("VUSM", record_id))HMSdat <- dat %>%mutate(uniqueID =paste("HMS", record_id))UVAdat <- dat %>%mutate(uniqueID =paste("UVASOM", record_id))#the list of IDs we decided as a group to excludeexcludeVU <-c(23, 26, 39, 40, 54, 3, 8, 12, 49, 60, 62, 64)excludeHMS <-c(1, 21, 28, 30, 34, 37, 39, 41, 44, 47, 49, 61)excludeUVA <-c(33, 47, 80, 81, 83)'%!in%'<-function(x,y)!('%in%'(x,y)) #make a way to use the not in commandVU_in <-filter(VUMSdat, record_id %!in% excludeVU)H_in <-filter(HMSdat, record_id %!in% excludeHMS)UVA_in <-filter(UVAdat, record_id %!in% excludeUVA)#now I want to select the columns I'd like to include for all of my analysis (so they're in the proper order for a cbind). this will be relatively easy to come back to edit later, if needed. #first, remember to include a school identifierVU_in[,"schoolid"] <-"VU"UVA_in[,"schoolid"] <-"UVa"H_in[,"schoolid"] <-"HMS"############ plan for how I will get the data in a format I want:# - pull relevant columns by "starts with"# - confirm all column names match, then# - rbind together once# - then I can select from this sheet the questions relevant to Step 1 first with ends with "_1", and those who took Step 1 second with a "_2"############ Now pulling the common columns we're interested in as predictors and outcomes##note, for VU I removed ""number_other_courses_step1_1" and starts_with("other_courses_step1_1___1) because these questions were not on the other school surveystook_step1_VU <- VU_in %>%select(starts_with("record_id"),starts_with("uworld_percent_step1"),starts_with("amboss_percent_step1"),starts_with("length_step1"),starts_with("practicetest_step1"),starts_with("full_test_practice_step1"), # split into binary: Yes and I am glad I did, Yes and it was unnecessary, No and I wish I did, No and I am glad I did notstarts_with("push_step1"),starts_with("push_practice_test_step1"),starts_with("push_nbme_practice_score_step1"),starts_with("push_uw_practice_score_step1"),starts_with("final_nbme_practice_score_step1"),starts_with("final_uw_practice_score_step1"),starts_with("score_step1"),starts_with("resources_step1"),starts_with("other_resources_step1"),starts_with("other_step1"),starts_with("changes_step1"),"uniqueID","schoolid", "exam_order" ) took_step1_UVA <- UVA_in %>%select(starts_with("record_id"),starts_with("uworld_percent_step1"),starts_with("amboss_percent_step1"),starts_with("length_step1"),starts_with("practicetest_step1"),starts_with("full_test_practice_step1"),starts_with("push_step1"),starts_with("push_practice_test_step1"),starts_with("push_nbme_practice_score_step1"),starts_with("push_uw_practice_score_step1"),starts_with("final_nbme_practice_score_step1"),starts_with("final_uw_practice_score_step1"),starts_with("score_step1"),starts_with("resources_step1"),starts_with("other_resources_step1"),starts_with("other_step1"),starts_with("changes_step1"),"uniqueID","schoolid", "exam_order" )took_step1_H <- H_in %>%select(starts_with("record_id"),starts_with("uworld_percent_step1"),starts_with("amboss_percent_step1"),starts_with("length_step1"),starts_with("practicetest_step1"),starts_with("full_test_practice_step1"),starts_with("push_step1"),starts_with("push_practice_test_step1"),starts_with("push_nbme_practice_score_step1"),starts_with("push_uw_practice_score_step1"),starts_with("final_nbme_practice_score_step1"),starts_with("final_uw_practice_score_step1"),starts_with("score_step1"),starts_with("resources_step1"),starts_with("other_resources_step1"),starts_with("other_step1"),starts_with("changes_step1"),"uniqueID","schoolid", "exam_order" )## now rbinding the three schools togethertook_step1general <-rbind(took_step1_H, took_step1_UVA, took_step1_VU)## splitting the dataset so I also have reference sheets specific to step 1 first and step 1 secondtook_step1_first <- took_step1general %>%select(ends_with("_1"), "schoolid") #I actually haven't used this data frame in the analysistook_step1_second <- took_step1general %>%select(ends_with("_2"), "schoolid") #I actually haven't used this data frame in the analysis#this function will accommodate for the survey which split up the step 2 responses by whether it was taken first or second. I would like to run analysis (and gather percent missing) across all scores as they are available. I can add an indicator column later to identify people who took it first/second. The idea with this function is the output (called storage_df) should be easy to colbind onto the original data frame. Voila!#also if we decide to change our minds and include more variables it will be quick to run.merge_my_columns <-function(input_cols, source_df){storage_df <-as.data.frame(matrix(nrow =nrow(source_df), ncol =length(input_cols)))names(storage_df) <- input_colsfor(i in1:length(input_cols)) { cols <- source_df %>%select(starts_with(input_cols[i])) %>%names()print(cols) new_cobined_col <-coalesce(source_df[, cols[1]], source_df[, cols[2]]) storage_df[, i] <- new_cobined_col}print("Above is a list of columns I have combined for you. Hope it looks right!")return(storage_df)}######## combine results across exam order for Step 2class(took_step1general$practicetest_step1_1) <-"integer"#had to change class in order to coalesce theseclass(took_step1general$practicetest_step1_2) <-"integer"class(took_step1general$push_uw_practice_score_step1_2) <-"integer"class(took_step1general$push_uw_practice_score_step1_1) <-"integer"cols_step1 <-c("uworld_percent_step1", "amboss_percent_step1", "length_step1", "practicetest_step1","full_test_practice_step1", "push_step1", "push_practice_test_step1", "push_nbme_practice_score_step1", "push_uw_practice_score_step1" ,"final_nbme_practice_score_step1", "final_uw_practice_score_step1","resources_step1", "score_step1", "other_resources_step1", "other_step1","changes_step1")to_add <-merge_my_columns(input_cols = cols_step1, source_df = took_step1general)#table(to_add$score_step1) ## all step 1 scores have an outcome so there is no one to drop/filter outstep1_complete <-bind_cols(took_step1general, to_add) %>%select(record_id, uniqueID:changes_step1)```Data quality: there are ` r length(step1_complete$uniqueID)` unique individuals included in the step 1 data. Below is a table of their inclusion by school:```{r}kable(table(step1_complete$schoolid))``````{r data_cleaning_forstep2}#| warning: false#| message: false#| echo: falsetook_step2_VU <- VU_in %>%select(starts_with("record_id"),starts_with("uworld_percent_step2"),starts_with("amboss_percent_step2"),starts_with("length_step2"),starts_with("practicetest_step2"),starts_with("full_test_practice_step2"),starts_with("practice_score_step2"),starts_with("practice_test_step2"),starts_with("score_step2"),starts_with("target_score_step2"),starts_with("resources_step2"),"uniqueID","schoolid", "exam_order" ) #note, I removed ""number_other_courses_step1_1" and starts_with("other_courses_step1_1___1) because these questions were not on the other school surveystook_step2_UVA <- UVA_in %>%select(starts_with("record_id"),starts_with("uworld_percent_step2"),starts_with("amboss_percent_step2"),starts_with("length_step2"),starts_with("practicetest_step2"),starts_with("full_test_practice_step2"),starts_with("practice_score_step2"),starts_with("practice_test_step2"),starts_with("score_step2"),starts_with("target_score_step2"),starts_with("resources_step2"),"uniqueID","schoolid", "exam_order" )took_step2_H <- H_in %>%select(starts_with("record_id"),starts_with("uworld_percent_step2"),starts_with("amboss_percent_step2"),starts_with("length_step2"),starts_with("practicetest_step2"),starts_with("full_test_practice_step2"),starts_with("practice_score_step2"),starts_with("practice_test_step2"),starts_with("score_step2"),starts_with("target_score_step2"),starts_with("resources_step2"),"uniqueID","schoolid", "exam_order" )took_step2general <-rbind(took_step2_H, took_step2_UVA, took_step2_VU)took_step2_first <- took_step2general %>%select(ends_with("_1"), "schoolid", "uniqueID") #don't plan on using these, but available if neededtook_step2_second <- took_step2general %>%select(ends_with("_2"), "schoolid", "uniqueID")########combine results across exam order for Step 2class(took_step2general$practice_score_step2_2) <-"integer"#had to change class in order to coalesce theseclass(took_step2general$score_step2_1) <-"integer"class(took_step2general$score_step2_2) <-"integer"columns_to_summarize <-c("uworld_percent_step2", "amboss_percent_step2", "length_step2", "practicetest_step2","full_test_practice_step2", "practice_score_step2", "practice_test_step2", "resources_step2", "score_step2","target_score_step2")to_add <-merge_my_columns(input_cols = columns_to_summarize, source_df = took_step2general)#listing the people with no outcome reporteddrop_these_with_no_outcome <-bind_cols(took_step2general, to_add) %>%filter(is.na(score_step2)) %>%select("record_id", "schoolid", "uniqueID")#dropping the people with no outcomeintermediate_step2 <-bind_cols(took_step2general, to_add) %>%filter(!is.na(score_step2))#creating the complete data setstep2_complete <- intermediate_step2 %>%select(record_id, schoolid:target_score_step2)```Notice that unfortunately, we had to drop `r nrow(drop_these_with_no_outcome) ` individuals who did not report a step 2 score. These raw counts and frequencies of people who did not give a step 2 score (and are therefor not eligible for analysis) are listed by institution below.```{r}# describing the missingnesskable(table(drop_these_with_no_outcome$schoolid), caption ="Number of missing Step 2 scores by institution")num <-as.vector(table(drop_these_with_no_outcome$schoolid))denom <-as.vector(table(step2_complete$schoolid))nonresponse_freq <-setNames(c(round(num/denom, 3)), c(names(table(step2_complete$schoolid))))kable(nonresponse_freq, caption ="frequency of missing step 2 scores by instition")######### visually profile missing responsesp1 <- visdat::vis_miss(step1_complete)p2 <- visdat::vis_miss(step2_complete)title_gg <-ggdraw() +draw_label("Response missingness for pooled survey results across exam order")gridded <-plot_grid(p1, p2, label_size =12, ncol =2, align ="hv", label_x =0.5,labels =c("Step 1","Step 2"))plot_grid(title_gg, gridded, nrow =2, rel_heights =c(0.1, 0.9))```# AnalysisNote, we plan to use the robust/sandwich variance estimator for regression models. One inclusion criteria is that the outcome variable ("Y") must be available for a subject to be included in the analysis question (ie. if they did not report a step 2 score, we won't perform relevant step 2 analysis on them).::: panel-tabset## Q1 Does order have an impact on Step 2 scores?We cannot analyze Step 1 since all survey responses reported passingFor Step 2 scores, I will perform a linear regression with:- Y = Step 2 score- X = factor ( 1 = "step 1 first", 2 = "step 2 first", 3 = "only step 1", 4 = "only step 2")- Z = school (need to adjust for this)::: callout-notestill need to use robust se:::```{r}step2_complete[ ,"order_factor"] <-factor(step2_complete$exam_order, levels =c(1,2, 3, 4), labels =c("step 1 first", "step 2 first", "only step 1", "only step 2"))step2_complete[ ,"school_factor"] <-factor(step2_complete$schoolid, labels =c("HMS", "UVA", "VUMS"))table(step2_complete$exam_order)exam_order_mod <-lm(formula = score_step2 ~ order_factor + school_factor, data = step2_complete)sandwich(exam_order_mod) #this gives the sandwich variancesummary(exam_order_mod) #model summary for reporting```Based on the model output (p values), there doesn't appear to be any signifigant associations between exam order and the score for Step 2. Since the $R^2$ value is essentially 0, I conclude that there was no effect of exam order on step 2 scores. ## Q2 What factors affect Step 2 score?Again, we cannot analyze Step 1 scores since all respondents reported passing.Based on our SAP, if there are any covariates with more than 30% of responses missing, we will drop that variable or populate it with 0, depending on context. For example, the percent of Amboss questions completed will be filled with 0 for people who didn't answer since it seems safe to assume they didn't complete any of the Amboss questions. If less than 30% are missing, I may consider performing bootstrap sampling of known values to replace missing values.After accounting for missingness, I will assess for co-linearity of the predictors (ie. correlation) using VIF. If there is high co-linearity, we will use LASSO to perform variable selection. If there is no evidence of concerning levels of colinearity, I will proceed with linear regression.```{r data_profile_and_pairs_plots}#| warning: false######## profile missingness in the step 2 data and addresspercents_missing <-round(colSums(is.na(step2_complete))/nrow(step2_complete), 3)*100kable(percents_missing, caption ="Percent missing observations for pooled Step 2 survey")#inspecting percent missing, it seems like most responses are now complete except Amboss. Based on our study plan, I will populate those without a response for Amboss with 0'sstep2_complete$amboss_percent_step2 <-ifelse(is.na(step2_complete$amboss_percent_step2) ==TRUE, 0, step2_complete$amboss_percent_step2)class(step2_complete$practicetest_step2) <-"integer"step2_complete[,"on_target"] <-factor(step2_complete$target_score_step2, levels =c(1,2,3), labels =c("at target", "above target", "below target"))step2_complete[, "practice_test_2_clean"] <-ifelse(is.na(step2_complete$practicetest_step2) ==TRUE, NA, substr(step2_complete$practicetest_step2, start =1, stop =1))step2_complete[, "number_of_practice_tests"] <-as.numeric(step2_complete$practice_test_2_clean)```Multiple linear regression with: - Y = Step 2 score - X1 = % UWorld - X2 = % Amboss - X3 = length study - X4 = # of practice tests - X5 = full test day (yes/no code as binary) - X6 = final practice score (however, is there some conversion between U World and Amboss - Jeffrey is looking into this) - Z = School (need to adjust for this)Careful to note that not all cases are complete - for example there are 399 responses in the complete step 2 dataset, of which for the number of practice tests taken, 108 are missing and 291 have a response recorded.Below I report the model results, sandqich variance, and VIF for step 2 scores model.```{r}step2_complete$simulate_full_practice <-ifelse(step2_complete$full_test_practice_step2 <=2, 0, 1)step2_complete$simulate_full_practice <-factor(step2_complete$simulate_full_practice, levels =c(0, 1), labels =c("Yes", "No"))step2_complete$length_step2 <-factor(step2_complete$length_step2 , levels =c(1:6), labels =c("less than 1 week", "1-2 weeks", "3-4 weeks", "5-6 weeks", "7-8 weeks", "more than 8 weeks"))mod_step2_scores <-lm(score_step2 ~ uworld_percent_step2 + amboss_percent_step2 + length_step2 + simulate_full_practice + practice_score_step2 + number_of_practice_tests + school_factor, data = step2_complete)summary(mod_step2_scores)sandwich(mod_step2_scores)vif(mod_step2_scores)mod_step2_scores2 <-lm(score_step2 ~ uworld_percent_step2 + amboss_percent_step2 + length_step2 + simulate_full_practice + practice_score_step2 + practice_test_2_clean + school_factor, data = step2_complete)summary(mod_step2_scores2) #we need to talk about these model results```I looked into the difference between practice_test_step2 (the final practice test I took before my exam was... 8 options) and practicetest_step2 (text response of how many practice tests did you take before step 2). I decided not to include either in the model.::: callout-noteWe need to talk about the model results between the two ways I coded the variables for # of practice tests:::## Q3 What is associated (in this data) with pushing back a Step 1 exam date?Here, I will perform logistic regression withY = yes or no (1 = yes, 2 = no for "push_step1")There may not be sufficient data on this since only 20 people responded that they decided to push back Step 1. The factors that were measured are:- push remember step1 (1 = I only remember the form name, 2 = I only remember the score, 3 = I remember the form name and the score, 4 = I don't remember either) - we decided not to include this variable (can change later if desired)- push score only step 1 (1 = NBME, 2 = Uworld)- push practice test step 1 ( 1 - 8 listing various exams)- push nbme practice score (from 0 to 100%)- push uw practice score (from 180 to 300)Listing variables by name and if I have included them:- "push_step1_1" yes- "push_remember_step1_1" not included b/c a precursor question- "push_score_only_step1_1" not currently included but could be- "push_practice_test_step1_1" yes- "push_nbme_practice_score_step1_1" yes- "push_uw_practice_score_step1_1" yes```{r}step1_complete$push_step1 <-ifelse(step1_complete$push_step1 ==2|is.na(step1_complete$push_step1) ==TRUE, 2, 1) #recording the NA's to be "No" (they did not push back step 1)step1_complete$push_step1_label <-factor(step1_complete$push_step1, levels =c(1, 2), labels =c("Yes", "No")) #making a nice descriptive labeldid_push_df <- step1_complete %>%filter(push_step1_label =="Yes")dat %>%filter(!is.na(push_remember_step1_1))step1_complete$push_practice_test_step1 <-factor(step1_complete$push_practice_test_step1, levels =c(1:8), labels =c("NBME 25", "NBME 26", "NBME 27", "NBME 28", "NBME 29", "NBME 30", "UWorld 1", "UWorld 2"))units(step1_complete$push_uw_practice_score_step1) <-"score units"units(step1_complete$push_nbme_practice_score_step1) <-"percent"label(step1_complete$push_practice_test_step1) <-"exam that triggered pushing back"label(step1_complete$push_nbme_practice_score_step1) <-"NBME P(passing) that triggered pushing back"label(step1_complete$push_uw_practice_score_step1) <-"3 digit UWorld score that triggered pushing back"caption <-"Description of indiviuals that pushed back Step 1"table1(~ push_practice_test_step1 + push_uw_practice_score_step1 + push_nbme_practice_score_step1 |push_step1_label, data=step1_complete, topclass="Rtable1-zebra")```## Descriptive Statistics (still in progress)Descriptive statistics will be reported by school and total.- histogram of Step 2 scores- what resources are most widely used barplot- how long did students study barplot- number of practice tests histogram (among the people who answered the question)- summarize comments (other_resources, other_step, changes_step)```{r}## do a plot for all schools (in total) as well as Vanderbilt in particular## use a unified color schemeggplot(aes(x = score_step2), data= step2_complete) +geom_histogram() +theme_minimal() +xlab("Self Reported Step 2 Score") +ylab("Frequency") +labs(title ="Frequency of reported Step 2 Scores")#more data cleaning first first step 2 for the resouces they selected that they usedresources_step2 <-colSums(took_step2general[20:35])uworld <- resources_step2[1] + resources_step2[9]first_aid <- resources_step2[2] + resources_step2[10]anki <- resources_step2[3] + resources_step2[11]sketchy <- resources_step2[4] +resources_step2[12]amboss <- resources_step2[5] + resources_step2[13]pathoma <- resources_step2[6] + resources_step2[14]boards_and_beyond <- resources_step2[7] + resources_step2[15]other <- resources_step2[8] + resources_step2[16]totals <-c(uworld, first_aid, anki, sketchy, amboss, pathoma, boards_and_beyond, other)#same idea for step 1resources_step1 <-colSums(took_step1general[26:41])uworld1 <- resources_step1[1] + resources_step1[9]first_aid1 <- resources_step1[2] + resources_step1[10]anki1 <- resources_step1[3] + resources_step1[11]sketchy1 <- resources_step1[4] +resources_step1[12]amboss1 <- resources_step1[5] + resources_step1[13]pathoma1 <- resources_step1[6] + resources_step1[14]boards_and_beyond1 <- resources_step1[7] + resources_step1[15]other1 <- resources_step1[8] + resources_step1[16]totals1 <-c(uworld1, first_aid1, anki1, sketchy1, amboss1, pathoma1, boards_and_beyond1, other1)rdf <-data.frame(amount =c(totals, totals1), name =rep(names(totals), 2), step_exam =c(rep("Step 2", 8), rep("Step 1", 8) ))# http://www.sthda.com/english/wiki/ggplot2-barplots-quick-start-guide-r-software-and-data-visualization used this as a guide#what resources are most widely used? I want to sort this barplot in order of frequency but for some reason this wasn't working for me with reorder()ggplot(data=rdf, aes(x=name, y=amount, fill = step_exam)) +geom_bar(stat="identity", position=position_dodge())+theme_minimal() +coord_flip() +scale_fill_brewer(palette="Blues")## how long did the students study barplot# for step 1step1_complete$length_step1 <-factor(step1_complete$length_step1 , levels =c(1:6), labels =c("less than 1 week", "1-2 weeks", "3-4 weeks", "5-6 weeks", "7-8 weeks", "more than 8 weeks"))ggplot(data.frame(step1_complete), aes(x=length_step1)) +geom_bar() +theme_minimal() +xlab("Time") +ylab("Frequency") +labs(title ="How long did you study for Step 1 during a protected study period?")# for step 2ggplot(data.frame(step2_complete), aes(x=length_step2)) +geom_bar() +theme_minimal() +xlab("Time") +ylab("Frequency") +labs(title ="How long did you study for Step 2 during a protected study period?")## number of practice tests histogram (among the people who answered the question)ggplot(aes(x = number_of_practice_tests), data= step2_complete) +geom_histogram() +theme_minimal() +xlab("Number of tests") +ylab("Frequency") +labs(title ="How many total practice tests did you take before Step 2?")ggplot(aes(x = practicetest_step1), data= step1_complete) +geom_histogram() +theme_minimal() +xlab("Number of tests") +ylab("Frequency") +labs(title ="How many total practice tests did you take before Step 1?")## summarize comments (other_resources, other_step, changes_step)#"other_resources_step1" "other_step1" "changes_step1" kable(table(step1_complete$other_resources_step1), caption ="Other listed resources for Step 1")#If you used other resources not listed above, would you use them again?step1_complete$other_hascontents <-as.integer(nchar(step1_complete$other_resources_step1) >0) #flagging the rows that have contents https://stackoverflow.com/questions/64744988/testing-to-see-if-characters-are-present-in-a-cell-in-ras_tibble(step1_complete[which(step1_complete$other_hascontents ==1), c("other_resources_step1", "other_step1") ])# what would you change?step1_complete$change_hascontents <-as.integer(nchar(step1_complete$changes_step1) >0) as_tibble(step1_complete[which(step1_complete$change_hascontents ==1), "changes_step1" ])#dat$study_amount_step1_1```:::# Appendix/notesAll the analyses are performed using the following:- R version 4.2.2 (2022-06-24); R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.- Harrell Jr FE (2022). rms: Regression Modeling Strategies. R package version 6.3-0, <https://CRAN.R-project.org/package=rms>.The table below lists packages used in this document.```{r}subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion))```